|
The English-Arabic Parallel Corpus Of United Nations Texts (EAPCOUNT) is one of the biggest available parallel corpora involving the Arabic language. It is intended as a general research tool, available beyond the present project for applied and theoretical linguistic research. It started as a PhD research project at the Department of Linguistics, University of Carthage, in 2006 by Dr. Hammouda Salhi (حمُّودة الصالحي), in collaboration with some of his students, and completed in 2010. The whole description of the corpus was completed in 2009 and revised in 2010. The EAPCOUNT project comes as a response to the unsatisfactory performance of general-purpose dictionaries (Zanettin, 2009), especially when it comes to translation studies and comparative research involving Arabic. It was also motivated by the increasing demands for cross-lingual research and information retrieval (Salhi, 2010). The EAPCOUNT comprises 341 texts aligned on a paragraph basis, which means texts in English along with their translational counterparts in Arabic. It consists of two subcorpora; one contains the English originals and the other their Arabic translations. As for the English subcorpus, it contains 3,794,677 word tokens, with 78,606 word types. The Arabic subcorpus has a slightly fewer word tokens (3,755,741), yet differs greatly in terms of the number of word types, which is 143,727. This means that the whole corpus contains 7,550,418 tokens. ==Texts included in the EAPCOUNT== The EAPCOUNT consists mainly, but not exclusively, of resolutions and annual reports issued by different UN organizations and institutions. Some texts are taken from the authoritative publications of another UN-like institution, namely the Inter-Parliamentary Union (IPU); representing 2.18% of the total number of tokens in the English subcorpus. But the great majority of texts are issued by the General Assembly and Security Council (66.44% SL tokens). The assumption here is that TL texts produced by these selected international bodies can be considered as translations of a high degree of reliability. All texts have been downloaded from first-hand sources (official websites of these agencies) in order to make sure that the publications are all kept in their original form. 抄文引用元・出典: フリー百科事典『 ウィキペディア(Wikipedia)』 ■ウィキペディアで「English-Arabic Parallel Corpus of United Nations Texts」の詳細全文を読む スポンサード リンク
|